Does choice in model selection affect maximum likelihood analysis?
نویسندگان
چکیده
In order to have confidence in model-based phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several model-selection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihood-ratio test (hLRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), and decision theory (DT), but their performance relative to empirical data has not been investigated thoroughly. In this study, we use 250 phylogenetic data sets obtained from TreeBASE to examine the effects that choice in model selection has on ML estimation of phylogeny, with an emphasis on optimal topology, bootstrap support, and hypothesis testing. We show that the use of different methods leads to the selection of two or more models for approximately 80% of the data sets and that the AIC typically selects more complex models than alternative approaches. Although ML estimation with different best-fit models results in incongruent tree topologies approximately 50% of the time, these differences are primarily attributable to alternative resolutions of poorly supported nodes. Furthermore, topologies and bootstrap values estimated with ML using alternative statistically supported models are more similar to each other than to topologies and bootstrap values estimated with ML under the Kimura two-parameter (K2P) model or maximum parsimony (MP). In addition, Swofford-Olsen-Waddell-Hillis (SOWH) tests indicate that ML trees estimated with alternative best-fit models are usually not significantly different from each other when evaluated with the same model. However, ML trees estimated with statistically supported models are often significantly suboptimal to ML trees made with the K2P model when both are evaluated with K2P, indicating that not all models perform in an equivalent manner. Nevertheless, the use of alternative statistically supported models generally does not affect tests of monophyletic relationships under either the Shimodaira-Hasegawa (S-H) or SOWH methods. Our results suggest that although choice in model selection has a strong impact on optimal tree topology, it rarely affects evolutionary inferences drawn from the data because differences are mainly confined to poorly supported nodes. Moreover, since ML with alternative best-fit models tends to produce more similar estimates of phylogeny than ML under the K2P model or MP, the use of any statistically based model-selection method is vastly preferable to forgoing the model-selection process altogether.
منابع مشابه
Vector Autoregressive Model Selection: Gross Domestic Product and Europe Oil Prices Data Modelling
We consider the problem of model selection in vector autoregressive model with Normal innovation. Tests such as Vuong's and Cox's tests are provided for order and model selection, i.e. for selecting the order and a suitable subset of regressors, in vector autoregressive model. We propose a test as a modified log-likelihood ratio test for selecting subsets of regressors. The Europe oil prices, ...
متن کاملFlood Flow Frequency Model Selection Using L-moment Method in Arid and Semi Arid Regions of Iran
Statistical frequency analysis is the most common procedure for the analysis of flood data at a gauged location thatin first step it is needed to select a model to represent the population. Among them, the central moment has been themost common and widely used, and with the using of computers, the application of the maximum likelihood hasincreased. This research was carried out in order to reco...
متن کاملEmpirical problems of the hierarchical likelihood ratio test for model selection.
Advocates of maximum likelihood (ML) approaches to phylogenetics commonly cite as one of their primary advantages the use of objective statistical criteria for model selection. Currently, a particular implementation of the likelihood ratio test (LRT) is the most commonly used model-selection criterion in phylogenetics. This approach requires the choice of a starting point and a parameter additi...
متن کاملAnalysis of a Problem Using Various Visions
In this paper an applied problem, where the response of interest is the number of success in a specific experiment, is considered and by various visions is studied. The effects of outlier values of response on results of a regression analysis are so important to be studied. For this reason, using diagnostic methods, outlier response values are recognized. It is shown that use of arc-sine ...
متن کاملModified Maximum Likelihood Estimation in First-Order Autoregressive Moving Average Models with some Non-Normal Residuals
When modeling time series data using autoregressive-moving average processes, it is a common practice to presume that the residuals are normally distributed. However, sometimes we encounter non-normal residuals and asymmetry of data marginal distribution. Despite widespread use of pure autoregressive processes for modeling non-normal time series, the autoregressive-moving average models have le...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systematic biology
دوره 57 1 شماره
صفحات -
تاریخ انتشار 2008